Automatic Text Decomposition and Structuring

نویسندگان

  • Gerard Salton
  • James Allan
چکیده

Sophisticated text similarity measurements are used to determine relationships between natural-language texts and text segments. The resulting linked hypertext maps are used to identify different text types and text structures, leading to improved text access and utilization. Examples of text decomposition are given for expository and non-expository texts. The vector processing model of retrieval has been used with substantial success to manipulate large collections of natural-language text. In vector processing, texts or text excerpts, as well as requests for information, are represented by sets of terms, or term vectors. Collectively the terms assigned to a given text are used to represent text content. Substantially identical methods are usable for determining collection structure (by comparing pairs of text vectors with each other and identifying text pairs found to be suuciently similar), and for retrieving information (by comparing query vectors with the vectors representing the stored items and retrieving items found to be similar to the queries). The results of a similarity computation between a query vector and the stored document vectors can be ranked in decreasing order of the computed query similarity. This makes

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Structuring of Written Texts

This paper deals with automatic structuring and sentence boundary labelling in natural language texts. We describe the implemented structure tagging algorithm and heuristic rules that are used for automatic or semiautomatic labelling. Inside the detected sentence the algorithm performs a decomposition to clauses and then marks the parts of text which do not form a sentence, i.e. headings, signa...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

A symbolic approach to automatic multiword term structuring

This paper presents a three-level structuring of multiword terms (MWTs) basing on lexical inclusion, WordNet similarity and a clustering approach. Term clustering by automatic data analysis methods offers an interesting way of organizing a domain’s knowledge structures, useful for several information-oriented tasks like science and technology watch, textmining, computer-assisted ontology popula...

متن کامل

Semi-Automatic Terminology Ontology Learning Based on Topic Modeling

Ontologies provide features like a common vocabulary, reusability, machine-readable content, and also allows for semantic search, facilitate agent interaction and ordering & structuring of knowledge for the Semantic Web (Web 3.0) application. However, the challenge in ontology engineering is automatic learning, i.e., the there is still a lack of fully automatic approach from a text corpus or da...

متن کامل

Automatic continuity of almost multiplicative maps between Frechet algebras

For Fr$acute{mathbf{text{e}}}$chet algebras $(A, (p_n))$ and $(B, (q_n))$, a linear map $T:Arightarrow B$ is textit{almost multiplicative} with respect to $(p_n)$ and $(q_n)$, if there exists $varepsilongeq 0$ such that $q_n(Tab - Ta Tb)leq varepsilon p_n(a) p_n(b),$ for all $n in mathbb{N}$, $a, b in A$, and it is called textit{weakly almost multiplicative} with respect to $(p_n)$ and $(q_n)$...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 32  شماره 

صفحات  -

تاریخ انتشار 1994